Statistical properties of the warped discrete cosine transform cepstrum compared with MFCC

نویسندگان

Rangarao Muralishankar

Abhijeet Sangwan

Douglas D. O'Shaughnessy

چکیده

In this paper, we continue our investigation of the warped discrete cosine transform cepstrum (WDCTC), which was earlier introduced as a new speech processing feature [1]. Here, we study the statistical properties of the WDCTC and compare them with the mel-frequency cepstral coefficients (MFCC). We report some interesting properties of the WDCTC when compared to the MFCC: its statistical distribution is more Gaussian-like with lower variance, it obtains better vowel cluster separability, it forms tighter vowel clusters and generates better codebooks. Further, we employ the WDCTC and MFCC features in a 5-vowel recognition task using Vector Quantization (VQ) and 1-Nearest Neighbour (1-NN) as classifiers. In our experiments, the WDCTC consistently outperforms the MFCC.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing frequency-warping and VTLN through linear transformation of conventional MFCC

In this paper, we show that frequency-warping (including VTLN) can be implemented through linear transformation of conventional MFCC. Unlike the Pitz-Ney [1] continuous domain approach, we directly determine the relation between frequency-warping and the linear-transformation in the discrete-domain. The advantage of such an approach is that it can be applied to any frequency-warping and is not ...

متن کامل

Speaker Identification using MFCC-Domain Support Vector Machine

Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into textindependent and text-dependent. This paper presents a technique of text-dependent speaker identification using MFCC-domain support vector machine (SVM). In this work, melfrequency cepstrum c...

متن کامل

Amplitude modulation features for emotion recognition from speech

The goal of speech emotion recognition (SER) is to identify the emotional or physical state of a human being from his or her voice. One of the most important things in a SER task is to extract and select relevant speech features with which most emotions could be recognized. In this paper, we present a smoothed nonlinear energy operator (SNEO)-based amplitude modulation cepstral coefficients (AM...

متن کامل

Mel-scaled Wavelet Filter Base Unvoiced Phoneme Re

In this paper we propose a filter bank structure derived by using admissible wavelet packet transform. These filters have Mel scale spacing and have an advantage of easy implementation with higher resolution in time-frequency domain because of wavelet transform. The features are obtained by first calculating the energy in each filter band and then applying the Discrete Cosine Transform (DCT) to...

متن کامل

Noise-Robust Speech Features Based on Cepstral Time Coefficients

In this paper, we investigate the noise-robustness of features based on the cepstral time coefficients (CTC). By cepstral time coefficients, we mean the coefficients obtained from applying the discrete cosine transform to the commonly used mel-frequency cepstral coefficients (MFCC). Furthermore, we apply temporal filters used for computing delta and acceleration dynamic features to the CTC, res...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Statistical properties of the warped discrete cosine transform cepstrum compared with MFCC

نویسندگان

چکیده

منابع مشابه

Implementing frequency-warping and VTLN through linear transformation of conventional MFCC

Speaker Identification using MFCC-Domain Support Vector Machine

Amplitude modulation features for emotion recognition from speech

Mel-scaled Wavelet Filter Base Unvoiced Phoneme Re

Noise-Robust Speech Features Based on Cepstral Time Coefficients

عنوان ژورنال:

اشتراک گذاری